Designing an A* Algorithm for Calculating Edit Distance between Rooted-Unordered Trees
نویسندگان
چکیده
Tree structures are useful for describing and analyzing biological objects and processes. Consequently, there is a need to design metrics and algorithms to compare trees. A natural comparison metric is the "Tree Edit Distance," the number of simple edit (insert/delete) operations needed to transform one tree into the other. Rooted-ordered trees, where the order between the siblings is significant, can be compared in polynomial time. Rooted-unordered trees are used to describe processes or objects where the topology, rather than the order or the identity of each node, is important. For example, in immunology, rooted-unordered trees describe the process of immunoglobulin (antibody) gene diversification in the germinal center over time. Comparing such trees has been proven to be a difficult computational problem that belongs to the set of NP-Complete problems. Comparing two trees can be viewed as a search problem in graphs. A* is a search algorithm that explores the search space in an efficient order. Using a good lower bound estimation of the degree of difference between the two trees, A* can reduce search time dramatically. We have designed and implemented a variant of the A* search algorithm suitable for calculating tree edit distance. We show here that A* is able to perform an edit distance measurement in reasonable time for trees with dozens of nodes.
منابع مشابه
Approximate Common Structures in XML Schema Matching1
This paper describes a matching algorithm that can find accurate matches and scales to large XML Schemas with hundreds of nodes. We model XML Schemas as labeled, unordered and rooted trees, and turn the schema matching problem into a tree matching problem. We develop a tree matching algorithm based on the concept of Approximate Common Structures. Compared with the tree edit-distance algorithm a...
متن کاملApproximation and parameterized algorithms for common subtrees and edit distance between unordered trees
Given two rooted, labeled, unordered trees, the common subtree problem is to find a bijective matching between subsets of nodes of the trees of maximum cardinality which preserves labels and ancestry relationship. The tree edit distance problem is to determine the least cost sequence of insertions, deletions and substitutions that converts a tree into another given tree. Both problems are known...
متن کاملA Polynomial-Time Metric for Attributed Trees
We address the problem of comparing attributed trees and propose a novel distance measure centered around the notion of a maximal similarity common subtree. The proposed measure is general and defined on trees endowed with either symbolic or continuous-valued attributes, and can be equally applied to ordered and unordered, rooted and unrooted trees. We prove that our measure satisfies the metri...
متن کاملTitle Approximation and parameterized algorithms for commonsubtrees and edit distance between unordered trees
Given two rooted, labeled, unordered trees, the common subtree problem is to find a bijective matching between subsets of nodes of the trees of maximum cardinality which preserves labels and ancestry relationship. The tree edit distance problem is to determine the least cost sequence of insertions, deletions and substitutions that converts a tree into another given tree. Both problems are known...
متن کاملApproximation and parameterized algorithms for commonsubtrees and edit distance between unordered
Given two rooted, labeled, unordered trees, the common subtree problem is to find a bijective matching between subsets of nodes of the trees of maximum cardinality which preserves labels and ancestry relationship. The tree edit distance problem is to determine the least cost sequence of insertions, deletions and substitutions that converts a tree into another given tree. Both problems are known...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of computational biology : a journal of computational molecular cell biology
دوره 13 6 شماره
صفحات -
تاریخ انتشار 2006